Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: TipRouter CLI separate stages #87

Merged
merged 124 commits into from
Mar 11, 2025
Merged

feat: TipRouter CLI separate stages #87

merged 124 commits into from
Mar 11, 2025

Conversation

tomjohn1028
Copy link
Collaborator

@tomjohn1028 tomjohn1028 commented Feb 13, 2025

Change notes

  • New top-level environment variable/CLI arg
    • SAVE_PATH: renaming of META_MERKLE_TREE_DIR. Path where all generated operator files are stored
  • Separates each stage of the process into more coherent functions.
    • run CLI command loops through each stage, saving the outputs to save-path CLI argument. The output from each stage is held in memory to reduce the processing.
      • New Environment Variables/CLI Args for run
        • STARTING_STAGE: Ability to start the operator from a specific stage before it continues
          • one of: load-bank-from-snapshot, create-stake-meta, create-merkle-tree-collection, create-meta-merkle-tree, cast-vote, wait-for-next-epoch
        • SAVE_STAGES: true/false - whether to save the state of intermediate stages in files. Needed for CLAIM_TIPS and SET_MERKLE_ROOT
        • SAVE_SNAPSHOTS: true/false - whether to save the bank loaded in the target as a full snapshot (~100GB).
  • Create CLI commands for each stage
    • snapshot-slot - Create and store a full snapshot for a given slot
    • create-stake-meta - For a given epoch and slot, load the bank and create the StakeMetaCollection file which contains validator vote accounts and their staking delegations.
    • create-merkle-tree-collection - For a given epoch, load the StakeMetaCollection from disk at save-path, generate and save the GeneratedMerkleTreeCollection.
    • create-meta-merkle-tree - For a given epoch, load the GeneratedMerkleTreeCollection from disk at save-path, generate and save theMetaMerkleTree.
    • submit-epoch - For a given epoch, load the MetaMerkle file and cast a vote to the NCN's ballot box. Optionally set the merkle root on the Tip Distribution account if the NCN has consensus for the epoch.
    • claim-tips - For a given epoch, generate and execute Tip Distribution Claim instructions for any unclaimed tips.
  • TipRouter saved files
    • generated files are now saved under save-path
    • Files <= current_epoch - num_monitored_epochs are purged for anything
    • NOTE: file names have change to a _{epoch}file_name.json format. All previous files will not be removed. Operators should manually delete old files.

@tomjohn1028 tomjohn1028 marked this pull request as ready for review February 16, 2025 15:24
@tomjohn1028 tomjohn1028 requested a review from ebatsell February 16, 2025 15:25
Copy link
Collaborator

@ebatsell ebatsell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add something that will purge the saved files in save_path that are older than “NUM_MONITORED_EPOCHS”? And should make sure that NUM_MONITORED_EPOCHS > 0.

Then we should add something in the release notes that the saved file formats are changing from “file_name_{epoch}.json” to “{epoch}_file_name.json” and will be auto purged, so any files that operators have in the old format they should manually remove

some_stake_meta_collection,
epoch_to_process,
ncn_address,
PROTOCOL_FEE_BPS,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull this from the NCN

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops this was an oversight.

Based on context from generator where these fees are actually deducted from tip_distribution_meta.total_tips I'm pretty sure it should be total_fee_bps() (like the following). But want to confirm

let config = get_ncn_config(&rpc_client, &tip_router_program_id, &ncn_address).await?;
let current_fees = config.fee_config.current_fees(epoch);
let protocol_fee_bps = current_fees.total_fees_bps()?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated: a1c33dc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

epoch should be:
tip_router_target_epoch = epoch + 1

Confusing convention but:
"target epoch" refers to the epoch that the snapshot is being created for (so usually the previous epoch)
"tip router target epoch" refers to the epoch after that which the on-chain program uses for that data

E.g. right now is epoch 10
"target epoch" is epoch 9, so we get 9_meta_merkle_tree.json, etc
But the ballot box, etc on chain for this meta_merkle_root will be in BallotBox::find_program_address(ncn, 10)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh that is confusing and great to know! Updated: 74b252a

// TODO: REVIEW should we expect snapshots to be in the save path rather than
// the typical validator snapshots path? This would save the fight from the
// validator process removing snapshots. We'd have to also update the snapshot
// process and CLI to handle
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we want to stop fighting the validator purging snapshots so save path is the best option - alternatively could we do something like: change the OperatorState enum to OperatorState::CreateStakeMeta(Option) where the inner value is the full snapshot path, if no bank is passed in? so operators would technically have a choice

Copy link
Collaborator Author

@tomjohn1028 tomjohn1028 Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variant with data causes clap::ValueEnum derive macro to fail. I think a separate CLI argument may make the most sense.

What I've determined is anywhere load_bank_from_snapshot is called, we assume the user is trying to save a new snapshot (or load the bank without having the snapshot). So in this case we should be using the existing full snapshots to load the bank.

Then where get_bank_from_snapshot_at_slot is used, we expect the snapshot at the exact slot to already exist. So here we should use the save-path as the full snapshot path. Which holds true if we save the full snapshots at save-path during creation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll get my proposed changes up for review shortly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

@tomjohn1028 tomjohn1028 Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seeing some oddities with some snpashot file placement with this. Looking into it. Everything else seemed to be running fine at the last epoch boundary

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! that logic makes sense for the variant issue

starting_stage: OperatorState,

#[arg(long, env, default_value = "true")]
save_stages: bool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we have one other for save_snapshot: bool that defaults to false just because the snapshots are so much bigger than the other stages

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed enable_snapshots => save_snapshot since that feature flag already existed and added an alias

@ebatsell ebatsell enabled auto-merge (squash) March 11, 2025 17:09
@ebatsell ebatsell merged commit d27f552 into master Mar 11, 2025
6 checks passed
@ebatsell ebatsell deleted the separate-stages branch March 11, 2025 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants